NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Embodied visuomotor representation

https://doi.org/10.1038/s44182-025-00047-y

Burner, Levi; Fermüller, Cornelia; Aloimonos, Yiannis (December 2025, npj Robotics)
DeCroon, Guido (Ed.)
Abstract Imagine sitting at your desk, looking at objects on it. You do not know their exact distances from your eye in meters, but you can immediately reach out and touch them. Instead of an externally defined unit, your sense of distance is tied to your action’s embodiment. In contrast, conventional robotics relies on precise calibration to external units, with which vision and control processes communicate. We introduceEmbodied Visuomotor Representation, a methodology for inferring distance in a unit implied by action. With it a robot without knowledge of its size, environmental scale, or strength can quickly learn to touch and clear obstacles within seconds of operation. Likewise, in simulation, an agent without knowledge of its mass or strength can successfully jump across a gap of unknown size after a few test oscillations. These behaviors mirror natural strategies observed in bees and gerbils, which also lack calibration in an external unit.
more » « less
Free, publicly-accessible full text available December 1, 2026
Repurposing Pre-trained Video Diffusion Models for Event-based Video Interpolation

Chen, Jingxi; Feng, Brandon; Cai, Haoming Cai; Wang, Tianfu; Burner, Levi; Yuan, Dehao; Fermüller, Cornelia; Metzler, Christopher A; Aloimonos, Yiannis (June 2025, IEEE)

Video Frame Interpolation aims to recover realistic missing frames between observed frames, generating a highframe- rate video from a low-frame-rate video. However, without additional guidance, the large motion between frames makes this problem ill-posed. Event-based Video Frame Interpolation (EVFI) addresses this challenge by using sparse, high-temporal-resolution event measurements as motion guidance. This guidance allows EVFI methods to significantly outperform frame-only methods. However, to date, EVFI methods have relied on a limited set of paired eventframe training data, severely limiting their performance and generalization capabilities. In this work, we overcome the limited data challenge by adapting pre-trained video diffusion models trained on internet-scale datasets to EVFI. We experimentally validate our approach on real-world EVFI datasets, including a new one that we introduce. Our method outperforms existing methods and generalizes across cameras far better than existing approaches.
more » « less
Free, publicly-accessible full text available June 21, 2026
Active Human Pose Estimation via an Autonomous UAV Agent

https://doi.org/10.1109/IROS58592.2024.10801780

Chen, Jingxi; He, Botao; Singh, Chahat Deep; Fermüller, Cornelia; Aloimonos, Yiannis (October 2024, IEEE)

Full Text Available
Minimal perception: enabling autonomy in resource-constrained robots

https://doi.org/10.3389/frobt.2024.1431826

Singh, Chahat Deep; He, Botao; Fermüller, Cornelia; Metzler, Christopher; Aloimonos, Yiannis (September 2024, Frontiers in Robotics and AI)

The rapidly increasing capabilities of autonomous mobile robots promise to make them ubiquitous in the coming decade. These robots will continue to enhance efficiency and safety in novel applications such as disaster management, environmental monitoring, bridge inspection, and agricultural inspection. To operate autonomously without constant human intervention, even in remote or hazardous areas, robots must sense, process, and interpret environmental data using only onboard sensing and computation. This capability is made possible by advancements in perception algorithms, allowing these robots to rely primarily on their perception capabilities for navigation tasks. However, tiny robot autonomy is hindered mainly by sensors, memory, and computing due to size, area, weight, and power constraints. The bottleneck in these robots lies in the real-time perception in resource-constrained robots. To enable autonomy in robots of sizes that are less than 100 mm in body length, we draw inspiration from tiny organisms such as insects and hummingbirds, known for their sophisticated perception, navigation, and survival abilities despite their minimal sensor and neural system. This work aims to provide insights into designing a compact and efficient minimal perception framework for tiny autonomous robots from higher cognitive to lower sensor levels.
more » « less
Full Text Available
Vector Symbolic Sub-objects Classifiers as Manifold Analogues

https://doi.org/10.1109/IJCNN60899.2024.10651219

Faraone, Renato; Sutor, Peter; Fermüller, Cornelia; Aloimonos, Yiannis (June 2024, IEEE)

Full Text Available
Generation of Novel Fall Animation with Configurable Attributes

https://doi.org/10.1145/3658852.3659087

Peng, Siyuan; Ladenheim, Kate; Shrestha, Snehesh; Fermüller, Cornelia (May 2024, ACM)
none (Ed.)
It takes less than half a second for a person to fall [8]. Capturing the essence of a fall from video or motion capture is difficult. More generally, generating realistic 3D human body motions from motion capture (MoCap) data is a significant challenge with potential applications in animation, gaming, and robotics. Current motion datasets contain single-labeled activities, which lack fine-grained control over the motion, particularly for actions as sparse, dynamic, and complex as falling. This work introduces a novel human falling dataset and a learned multi-branch, Attribute-Conditioned Variational Autoencoder model to generate novel falls. Our unique dataset introduces a new ontology of the motion into three phases: Impact, Glitch, and Fall. Each branch of the model learns each phase separately and the fusion layer learns to fuse the latent space together. Furthermore, we present data augmentation techniques and an inter-phase smoothness loss for natural plausible motion generation. We successfully generated high-quality images, validating the efficacy of our model in producing high-fidelity, attribute-conditioned human movements.
more » « less
Full Text Available
AcTExplore: Active Tactile Exploration on Unknown Objects

https://doi.org/10.1109/ICRA57147.2024.10611667

Shahidzadeh, Amir-Hossein; Yoo, Seong Jong; Mantripragada, Pavan; Singh, Chahat Deep; Fermüller, Cornelia; Aloimonos, Yiannis (May 2024, IEEE)

Full Text Available
Microsaccade-inspired event camera for robotics

https://doi.org/10.1126/scirobotics.adj8124

He, Botao; Wang, Ze; Zhou, Yuan; Chen, Jingxi; Singh, Chahat Deep; Li, Haojia; Gao, Yuman; Shen, Shaojie; Wang, Kaiwei; Cao, Yanjun; et al (May 2024, Science Robotics)

Neuromorphic vision sensors or event cameras have made the visual perception of extremely low reaction time possible, opening new avenues for high-dynamic robotics applications. These event cameras’ output is dependent on both motion and texture. However, the event camera fails to capture object edges that are parallel to the camera motion. This is a problem intrinsic to the sensor and therefore challenging to solve algorithmically. Human vision deals with perceptual fading using the active mechanism of small involuntary eye movements, the most prominent ones called microsaccades. By moving the eyes constantly and slightly during fixation, microsaccades can substantially maintain texture stability and persistence. Inspired by microsaccades, we designed an event-based perception system capable of simultaneously maintaining low reaction time and stable texture. In this design, a rotating wedge prism was mounted in front of the aperture of an event camera to redirect light and trigger events. The geometrical optics of the rotating wedge prism allows for algorithmic compensation of the additional rotational motion, resulting in a stable texture appearance and high informational output independent of external motion. The hardware device and software solution are integrated into a system, which we call artificial microsaccade–enhanced event camera (AMI-EV). Benchmark comparisons validated the superior data quality of AMI-EV recordings in scenarios where both standard cameras and event cameras fail to deliver. Various real-world experiments demonstrated the potential of the system to facilitate robotics perception both for low-level and high-level vision tasks.
more » « less
Full Text Available
Ajna: Generalized deep uncertainty for minimal perception on parsimonious robots

https://doi.org/10.1126/scirobotics.add5139

Sanket, Nitin J.; Singh, Chahat Deep; Fermüller, Cornelia; Aloimonos, Yiannis (August 2023, Science Robotics)

Robots are active agents that operate in dynamic scenarios with noisy sensors. Predictions based on these noisy sensor measurements often lead to errors and can be unreliable. To this end, roboticists have used fusion methods using multiple observations. Lately, neural networks have dominated the accuracy charts for perception-driven predictions for robotic decision-making and often lack uncertainty metrics associated with the predictions. Here, we present a mathematical formulation to obtain the heteroscedastic aleatoric uncertainty of any arbitrary distribution without prior knowledge about the data. The approach has no prior assumptions about the prediction labels and is agnostic to network architecture. Furthermore, our class of networks, Ajna, adds minimal computation and requires only a small change to the loss function while training neural networks to obtain uncertainty of predictions, enabling real-time operation even on resource-constrained robots. In addition, we study the informational cues present in the uncertainties of predicted values and their utility in the unification of common robotics problems. In particular, we present an approach to dodge dynamic obstacles, navigate through a cluttered scene, fly through unknown gaps, and segment an object pile, without computing depth but rather using the uncertainties of optical flow obtained from a monocular camera with onboard sensing and computation. We successfully evaluate and demonstrate the proposed Ajna network on four aforementioned common robotics and computer vision tasks and show comparable results to methods directly using depth. Our work demonstrates a generalized deep uncertainty method and demonstrates its utilization in robotics applications.
more » « less
Full Text Available
TTCDist: Fast Distance Estimation From an Active Monocular Camera Using Time-to-Contact

https://doi.org/10.1109/ICRA48891.2023.10160683

Burner, Levi; Sanket, Nitin J.; Fermüller, Cornelia; Aloimonos, Yiannis (May 2023, 2023 IEEE International Conference on Robotics and Automation (ICRA 2023))

Distance estimation from vision is fundamental for a myriad of robotic applications such as navigation, manipulation,and planning. Inspired by the mammal’s visual system, which gazes at specific objects, we develop two novel constraints relating time-to-contact, acceleration, and distance that we call the τ -constraint and Φ-constraint. They allow an active (moving) camera to estimate depth efficiently and accurately while using only a small portion of the image. The constraints are applicable to range sensing, sensor fusion, and visual servoing. We successfully validate the proposed constraints with two experiments. The first applies both constraints in a trajectory estimation task with a monocular camera and an Inertial Measurement Unit (IMU). Our methods achieve 30-70% less average trajectory error while running 25× and 6.2× faster than the popular Visual-Inertial Odometry methods VINS-Mono and ROVIO respectively. The second experiment demonstrates that when the constraints are used for feedback with efference copies the resulting closed-loop system’s eigenvalues are invariant to scaling of the applied control signal. We believe these results indicate the τ and Φ constraint’s potential as the basis of robust and efficient algorithms for a multitude of robotic applications.
more » « less
Full Text Available

« Prev Next »

Search for: All records